Option panel : Spider
- Accept cookies
Accept cookies generated by the remote server
If you do not accept cookies, some "session-generated" pages will not be retrieved
- Check document type
Define when the engine has to check document type
The engine must know the document type, to rewrite the file types. For example, if a link called /cgi-bin/gen_image.cgi generates a gif image, the generated file will not be called "gen_image.cgi" but "gen_image.gif"
Avoid "never", because the local mirror could be bogus
- Parse java files
Must the engine parse .java files (java classes) to seek included filenames?
It is checked by default
- Spider
Must the engine follow remote robots.txt rules when they exist?
The default is "follow"
- Tolerant requests
Tolerate wrong file size, and make requests compliant with old servers
It is unchecked by default, because this option can cause files to be bogus in the site
- Force old HTTP/1.0 requests
This option force the engine to use HTTP/1.0 requests, and avoid HEAD requests.
Useful for some sites with old server versions, or with many dynamically generated pages.
Back to Home